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Eugene Litwak 

I should like to make seme ccssnsits on specific points raised 
by Professor Trow and then advance a multi -model theory of evalua- 
tions with ensuing predictions as to what type of evaluation stra- 
tegies might be ideal for twenty- four "generic" situations. 

Objective and "Intuitive” Evaluation Techniques . Professor 
Trow provided a good case in point for Robert Stake* s view (stated 
in his discussion with Glaser) that the degree to which we can pro- 
vide good measures is not necessarily related to the importance of 
the objects we are trying to evaluate. As a consequence, when we 
do not have "objective" measures we may have to utilize crude evalua- 
tion techniques. Insisting on more objective measures may mean no 
evaluation at all or one which is quantifiable but a poorer predictor 
than quantitative judgments. Thus, Professor Trow points out that 
it is difficult to operationalize some of the goals of higher educa- 
tion— the notions of good citizenship and liberal education. These 
are goals which might be achieved 10 to 15 years after a person 
leaves college and which involve properties which are difficult to 
measure. The history of evaluation has been one where we have tried 
to introduce quantification into new areas. On the whole, this has 
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been beneficial. However, as this movement has gained success the 
dangers mentioned by Stake, and suggested in specific detail by Trow 
increase. Current evaluation specialists must increasingly ask then- 
selves when to use quantitative techniques for evaluation and when to 
use more qualitative techniques, rather than assume that invariably 
quantitative techniques are better. This argument must be differen- 
tiated from the one in the past where a hard core group resisted all 
systematic qualitative evaluation and another insisted on it. In 
the second part of the paper we will suggest specific evaluation pro- 
cedures where people can make only a gross estimate of their goals. 

Daily Effectiveness and Program Results — Two Types of Evaluation. 
Another point made by Trow illustrates something discussed by Lortie 
and Gage in their exchange. Trow points out that it is difficult for 
people to accept evaluations, especially when their jobs are at stake 
(e.g., when someone in a position superior to theirs is involved). A 
host of literature (including an article by Lortie) supports this 
view. In effect, what Trow suggests is that perhaps we should find a 
way to put evaluation in the hands of the people who are doing the job 
or in the hands of their colleagues. I think this touches upon the 
discussion between Gage and Lortie as to where evaluation efforts should 
be made. 

I would suggest that there are two legitimate notions of evalua- 
tion that should be accepted. One is the notion of daily job 



effectiveness . The individual uses the daily information to change 
his behavior and perform his job more effectively. As mentioned above, 
there does seem to be some evidence that such kinds of evaluations 
require a trusted colleague or individuals themselves to do the evalua- 
tion. The h&j or problem with this kind of an evaluation is that the 
evaiuatOx becomes too identified with the individual being evaluated 
and in situations of ambiguity is likely to orient the evaluation in 
terms of personal welfare rather than around the goals of the organiza- 
tion. This is commonly recognized in the cry from the "objective" 
outside evaluator. It is my view that this second type of evaluation 
is also necessary; I would call it an overall program eval ua tion. It 
does involve outside or "impartial" evaluators and the total adminis- 
trative hierarchy. It is also characterized by the fact that it is 
not a daily evaluation but yearly or less frequent. This evaluation 
has all the problems raised by Trow— that people will find it diffi- 
cult to accept— as well as the virtues of being able to take a hard 
look at what is being accomplished . It seems to me that we have two 
important problems with regard to evaluation and each of them requires 
a different kind of evaluation. I think that Trow has emphasized only 
one side of the issue. I would suggest that the evaluator must, in 
any given situation, make a diagnosis of the problem. Is he trying to 
find methods for getting teachers to improve their daily efforts 
through same systematic feed-back device, or is he interested in 



overall program evaluation? Both are legiti m ate goals and at any 
given stage in an educational institution's development he may want 
one or the other or both stressed. 

Hawthorne Effect and Social Engineering . My next point concerns 
what Trow referred to as the "Hawthorne" effect. His point is very 
similar (and paradoxically different) to the remarks made by Gage 
in his description of Stephen's work. Gage points out that most 
evaluations show that different school programs make ill lie difference 
on the students’ progress. By contrast. Trow points out that most 
experiments in education seem to work. These are not necessarily con- 
tradictory propositions since one is talking about experiments and 
the other about established school programs. What is similar about 
both of these propositions is that both Stephens and Trow suggest 
that the crucial underlying variable is teacher ability and enthusiasm. 
These are far more important than program variations. Within a school 
system teachers with outstanding abilities are randomly distributed 
among the programs. That presumably is why difference between programs 
means so little. Among the experimentors and the non-experimentors 
they are not randomly distributed. The experimentors are usually 
highly enthusiastic and able. That is why all experiments work. 

I would agree that the Hawthorne effect is an important one and 
that we should concentrate on ways for maintaining it continuously. 
However, I think that there are also legitimate problems of social en- 
gineering which might explain why successful experiments cannot be 
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translated into successful school programs. It seams to me that often 
experiments have many hidden complexities aside from ability and enthus- 
iasm of the investigator which the investigator cannot translate to 
a system-wide basis; sometimes because the investigator is not aware 
of then, often because there is a lack of knowledge as to how to intro- 
duce innovation into a system (both the letter and the spirit of the 
innovation) and often because the system Uuu ±ct put the kind of resources 
used in tiie experiment into the general application and what emerges 
is a watered down version of the experiment. 

I would be somewhat pessimistic about our educational establish- 
ment if indeed all that was involved w T as a "Hawthorne" effect, because 
I doubt very much that mass institutions can find sufficient people of 
the high calibre and degree of enthusiasm suggested by such analysis. 

I would therefore suggest that we concentrate in addition on the organ- 
izational basis for accepting innovation of all kinds rather than how 
to maintain involvement at the highest pitch. 

Towards a General Theory of Evaluation . I think throughout this con- 
ference there has been a questioning as to whether there is one ideal 
form of evaluation which holds in all situations or whether we have 
different strategies of evaluation for different situations. Glaser 
raised this point quite clearly. I would opt for the latter point of 
view and would now like to review some of the elements which would 
have to be considered and the differential evaluation techniques they 
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imply. The variables I am suggesting as being generic and their re- 
lationships to evaluation strategies are as yet veiy primitive. How- 
ever, I do want to go beyond the platitudinous statement that dif- 
ferent situations require different evaluation techniques. With this 
limitation in mind, the following are some of the factors which can 



be used to differentiate all situations and as a consequence suggest 
differential evaluation techniques. 

Currenr Stare of Knowledge. The ’'classic 11 evaluation technique 
is very close to the pure experiment or "classic" planning strategies. 
The suggestions in all cases tend to be the same. First specify the 
goal then the alternative strategies (i.e., teaching procedures) for 
reaching this goal. All the evaluator has to do is to measure the 
children before the new program is introduced, measure him after and 
decide which if any of the programs show the most marked difference. 
Assumed in this analysis is the ability to define one's goals clearly 



(measure their achievement) as well as to specify the range of alter- 
native means. Professor Trow has pointed out that it is often dif- 
ficult if not impossible to measure one’s goals or even to specify 
them clearly. He might have also added that it is often difficult 
if not impossible to specify alternative means. There are various 
reasons for this, (e.g., there is not enough time, it costs to much, 
etc.). However, in this section I want to stress one reason- -the 
state of knowledge. Is there any theory which systematically sug- 
gests what are the best evaluation strategies when we have incomplete 



knowledge? Most of them start out with the premise that before evalua- 
tion can begin we must have excellent states of knowledge . The work of 
Dahl and Lindblocm and more recently that of Lindbloom on decision mak- 
ing strategies provide some useful alternatives. They suggest 
in situations where things are going reasonably well in the sense that 
there are no major calamities, that one use an incremental strategy 
This implies introducing innovations which tend to be simply monotonic 
projections of past historical trends and which die reversible, mis 
often means small innovations. If nothing major happens then one con- 
tinues this process. Still assuming that one has only a gross speci- 
fication of goals and little knowledge of alternative means, they 
suggest that an alternative strategy be used when the situation is bad, 
(as judged by gross qualitative evaluation). Thus, a major depression 
or the clear sense of the community that the school procedures are not 
working well in the inner city would be cases in point. In this situa- 
tion they suggest a "calculated risk" strategy. The main point of this 
strategy is that one is to depart as radically as possible from past 
historical trends and pay less attention to the reversibility of the in- 
novation. The reasoning behind this directive is that where things are 
going very badly, little can be lost and much gained by radical shifts 
in methods. 

The important point to be stressed is that they are suggesting 
"rational" strategies in situations where we have incomplete knowledge. 
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If their arguments are correct they also suggest criteria for eval- 
uation under incomplete states of knowledge. Itfiat they are saying is 
that the evaluator need make only the grossest qualitative assess- 
ments about goals in situations where goals cannot be clearly speci- 
fied because of lack of knowledge. Thus, the college faculty must 
make a decision right now as to what constitutes requirements for a 
liberal arts degree. Yet the goals they seek to achieve (such as 
good citizenship and the humanitarian man) cannot be measured right 
now with any degree of accuracy. At this point, Dahl and Lindbloom 
would be suggesting that the evaluator only has to make, in conjunc- 
tion with his client, a qualitative judgment as to whether liberal 
arts programs have failed or not. If he feels that they have not 
obviously failed in the sense that there is no general complaint or 
he has sane positive general assessment, then he might adopt the in- 
cremental approach. This means he should measure any innovation on 
three criteria- -does is fit within the historical trend, is it revers- 
ible, does it have any consequence based on the same kind of generalized 
judgment which can be thought of as definite failure or success? Alter- 
natively, if the initial assessment is that the current situation is 
very bad then the evaluator uses the "calculated risk" as the basis for 
setting evaluation criteria. In both cases where historical data are 
not available the evaluator might utilize as his comparison group other 
institutions engaged in similar work and in similar circumstances. 
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To summarize, what is being said is that where one has relatively 
complete information as to goals and means then one can use the tradi- 
tional "experimental" before and after evaluation approach. However, 
uhere one lacks knowledge, then one uses only gross judgments on goals 
and turns one’s attention to the evaluation of a given approach as 
being historical on or off the trend line as well as judging the rever- 
sibility of the innovation. The more completely one can develop a 
theory of decision making under circumstances of differential states 
of knowledge, the more confident one can be about having a general 
theory of evaluation that fits the problems that often confront evalua- 
tors (e.g., how to evaluate -with incomplete knowledge). 

Economic Manpower Scope of Evaluation. Mother problem which 
emerges in evaluation is the scope of the evaluation procedures. 

Should we jump into an evaluation of total systems or should we first 
evaluate small experimental programs? It seems to me that one might 
move towards small experimental laboratory evaluation procedures where 
one has good knowledge (operational measures) of goals and alternative 
means but little knowledge as to their relationship. A small laboratory 
based evaluation situation permits the investigator to engage in alx 
kinds of variations with minimal concern for costs. Thus, the general 
rule would be that where one is suggesting the use of very costly evalua- 
tion processes and where one has high states of knowledge on means and 
goals but noc their relationship to each other, the evaluator moves to- 
ward a small experimental model. By contrast, where he has low cost 
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processes and either high or lew states of knowledge he might want to 
utilize large scope evaluation procedures (e.g., large field experi- 
ments or surveys) . This discussion bears directly on the point that 
Alkin was making, hhere a technique was extremely costly the evalua- 
tor might either restrict it to snail experimental situations or even 
say it is not worthwhile studying even if it were the most success- 
ful. Thus, a teaching method which says that there must be one 
teacher for every child in the school might be the most successful 
teaching technique, yet one which we would not bother to evaluate 
or evaluate in a laboratory- like situation since even with the 
optimal effectiveness, the costs would be too high for any system 
to undertake. 

Controllability of Independent Variables, and Experimental Versus Survey 
Procedures. Another factor which obviously affects the evaluation pro 
cedure is the controllability of the independent variable. Often in the 
field of education as well as in social sciences in general it is dif- 
ficult to control our independent variables. We are often in the posi- 
tion of astronomers rather than laboratory experimental physics. For 
instance, we are often in the position of looking at two schools, one 
which has a close school- community relationship and the other which 
does not. We want to see what difference this makes for the child* s 
reading skills. However, we are not in a position to get the schools 
to alter their procedures systematically. If we are fortunate and can 
spot these incipient experiments before hand, we can do some panel 
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analysis. If we are unfortunate, v;e rust do a one-shot comparative 
survey after the schools have begun their programs. In any case we 
must seek to match out populations through statistical manipulation 
or stratified procedures rather than relying on random assignments 
to experimental and control groups. I think there are at least 
three points on the continuum of controllability, hhere oone has 
maximum controllability then one can approach the classic experi- 
mental design, hhere one can anticipate changes but not control 
them, then one can utilize a panel analysis design and highly pur- 
poseful samples (e.g., natural experiments). When one can neither 
anticipate or control independent variables than one uses a random 
sample survey and relies on statistical analysis to provide matched 

groups, etc. 

Complexity Versus Simple— Experiments and Surveys. Unlike sane re- 
searchers, the evaluator is often called upon to evaluate a stimulus 
in all of its complexities. By contrast, a researcher faced with a 
complex stimulus can at his leisure break it down into its component 
parts and study each part separately. He can leave to others the 
problem of how these parts might interact with each other. However, 
a policy maker might want to know how a given method of teaching will 
interact with the various types of teachers he must have in his schools, 
the various types of intellectual abilities of the students he con- 
fronts as well as the various types of motivation they bring to the 
situation, the various types of social economic groupings of parents 
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he mist deal with, and the generalized ca ntunity support for such a 
program. 

Any time the stimulus is a complex one (i e., consisting of many 
independent vari ab les with scoe causal links to each other as well as 
to the dependent variable) the kind of model that Gagne was suggest- 
ing would be difficult to undertake. It would involve an intolerable 
number of controlled experiments and might yet miss the overall 
causal links between independent variaoles. In such situations one 
might well move to a very large survey or panel study which peimitted, 
in a relatively short period of time, an examination of many differenu 
combinations of variables. This might not have the logical eloquence 
that is suggested by the pure experiment but it has the virtue of pro- 
viding useful information in a reasonable time. 

There is nothing said so far which is very new. However, I would 
suggest that two things derive from the above analysis which might be 
viewed as more controversial. First, on the basis of the reasoning I 
have just gone through, we should forego the notion that there is one 
ideal mode of evaluation and move towards the concept of a multiple 
model. In fact, this is what most evaluators are now doing, and what 
I am suggesting is that rather than viewing this as a departure from 
an ideal norm we view it as an ideal state. This in turn leads to the 
second point; is there seme theory which states what type of evalua- 
tion processes are ideal for the various situations which confront 




evaluation. Can w*e show that there are really a limited lumber of 
dimensions which characterize most situations ice have to evaluate : 
If so, we have a finite mniber of models of evaluation procedures 
rather than an infinite number . The specification of the basic 
dimensions for classifying situations as well as their evaluation 
outcome would constitute a multiple model theory of evaluation. 

What I have done in the above section of this paper is suggest sane 
of the obvious starting points for such a classificatory scheme as 
well as some of the evaluation outcomes. To make this point quite 
clear, these dimensions must now be simultaneously considered and 
the forms of evaluation which ideally emerge from this simultaneous 
interaction specified. 

Table one presents in tabular form my first approximates of a 
multiple model theory of evaluation. This theory is based on all 
possible combinations of the following simple principles. 

I. Complete knowledge of ends and means permits true experimental 
evaluations and the purposeful sampling of individuals where 
necessary, (e.g. , a priori matching groups). 

Incomplete knowledge of ends and means generally precludes the 
use of experimental designs — requiring either survey or panel 
analysis type instruments and requiring randan selections of 
populations. Where the overall lack of knowledge is coupled 
with the gross evaluation that the situation is alright then 
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the e valuat or seeks comparative data — i.e., either historical or 
not — with the goal in mind of judging any innovation in terms of 
its continuity and reversibility. Where this lack of knowledge 
is coupled with a gross evaluation that the current state is very 
bad then comparative data is examined to see how far the new in- 
novation departs frcm the old. 

II. Where the evaluation process is costly (in terns of time, man- 
power, or general economic resources) then snail laboratory 
evaluation procedures are desirable. Where the evaluation pro- 
cess is not costly then large scale surveys or field experiments 
are possible. 

III. Where the evaluator has complete control over the stimulus he can 
use experimental designs, where he has only partial control he 
needs to use partial experimental designs like panel analysis, 
while where he has no control he must use techniques like survey 
analysis . 

XV. -.here the stimulus to be examined is very simple it provides an 
ideal situation for small group experiments, whereas if the stimu- 
lus is very complex (there are many independent variables and they 
are related to each other in a causal sequence) then large surveys 
or panel studies will generallly be necessary. 

With this in mind, we can look at cell nunber 1 in our table . Ac- 
cording to our multiple model theory of evaluation this is the situation 
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where the evaluator should use a small experimental laboratory study 
to do his evaluation because he has fairly good knowledge of the means 
and ends, the stimuli (means) are very simple, it would be costly to 
do the evaluation on a large scale, and he is able to control the 
stimulus. By contrast, if the evaluator is in a situation described 
by cell 24 he would use large scale surveys with random samples. This 
is true because he lacks knowledge to operationalize the ends, he can 
not control the stimulus, he assumes the stimulus is complex, and he 
can collect much data inexpensively. These conditions prevent him 
from setting up an experimental laboratory evaluation or even seeking 
a natural experiment. At the same time they put a premium on gather- 
ing much information (e.g., complex stimulus, lack of knowledge, and 

low costs) . 

The reader will note that cells 13, 16, 19, and 22 are all con- 
sidered to be logically impossible. It is argued that in situations 
where there is incomplete knowledge of ends and means, one cannot (by 
definition) control "the means (stimulus) . j. c we examine cell 12 we 
find an interesting mixture which in turn suggests a slightly dif- 
ferent kind of evaluation method. This is a situation where there is 
knowledge of ends and means but where the investigator cannot control 
the stimulus. This is a typical problem of astronomy. In addition, 
the stimulus is very complex which tends to suggest the use of large 
survey and this is further reinforced by the low cost of the evaluation. 

differ from the survey discussed in cell 24 



However, this survey can 
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because here the investigator has much more knowledge of the ends and 
means. He can put this knowledge to use by his sampling procedures. 

He can either sample to insure that he has incorporated natural ex- 
periments or he can stratify his sample to insure that he has relatively 
equal numbers of cases for all of his major variables. Cell 6 is like 
cell 12 except we now have a situation where the costs of the evaluation 
are high. In such circumstances the size of the sample will probably 
shrink so we new have a medium rather than a large survey. Cell 11 
is also like cell 12 but it differs in that the investigator has some 
control but not complete control over his environment. This suggests 
that he might be on the scene before a natural experiment is begun 
and thus he might be able to get before and after measures and do a 
panel analysis though not have a true experiment. Cell 5 is just like 
cell 11 but involves a more costly evaluation so we would suggest the 
chief thing differentiating them would be the size of the panel study. 
Cell 10 is like cell 12 but here the investigator has control over 
his environment. This permits an experiment but the large number of 
variables would suggest that he might not be able to do all possible 
experiments nor would many single experiments necessarily unravel the 
interactions between the independent variables. Since this cell also 
states that we are not dealing with a low cost situation, it would seem 
to us that a large field experiment coupled with much interview data 
would be appropriate. The experimental design will permit one to test 
out some of the variables through experimentation while the use of the 
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survey and consecjuent panel analysis would permit; one to use s tat is 
tical analysis to deal with the more obscure variables and the more 
intricate set of interactions. Cell 4 would be like cell 10 but for 
the increased cost. This may mean smaller field experiments or the 
use of many laboratory experiments as the chief evaluation proce- 
dure. The reader will recall that we said that cell 1 was the ideal 
situation for a small laboratory experiment. We think that cell 7 
would be the ideal situation for a small field experiment. It is 
exactly like cell 1 but there are little costs in doing the field 
experiments so it should be done because it often means one less 
inference for the evaluator (e.g., will the laboratory results hold 
in the field) . The reason that this field experiment can be small 
whereas cell 10, which is very close to cell 7, must involve large 
field experiments, is because cell 7 has a single or simple stimulus. 
The reasoning for cells 2 and 8 follow those for 5 and 11 with the 
difference being in a simple rather than complex stimulus. Similarly, 

3 and 9 follow 6 and 12. 

If we now examine the opposite side of the table where we have 
incomplete knowledge, it has already been noted that cell 24 differs 
from cell 12 (which matches it except there is complete knowledge) 
in having a random sample rather than a purposeful sample. In ad 

* 

dition, this theory suggests that where incomplete knowledge is 
coupled with a positive gross evaluation of current activities the 
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evaluator will utilize his statistical techniques to look at the method 
being evaluated historically (through retrospective questions) or com- 
paratively with similar organizations and all assessments will be guided 
in terms of their "fit” to historical or comparative trends. In ad- 
dition the methods will be evaluated in terms of their reversibility. 

In contrast, if the gross evaluation is that the current situation is 
very bad, then the evaluator, using the same comparative data, will see 
how far the innovation departs from the historical or comparative stan- 
dards. Cell 23 is almost the same as cell 24 but here there is some 
partial state of control over the stimulus. However, it suggests that 
the control may not be quite as great as cell 11 which is the same ex- 
cept for the knowledge base. Therefore, it is suggested that here we 
might have some kind of simulated panel study — through use of cohort 
analysis and possibly two cross sectional surveys taken at two dif- 
ferent periods of time but not with the same people. Using statis- 
tical devises one can have a simulated panel design. Cell 18 would 
be like cell 24 but requires a medium sized survey because of the 
cost factor and cell 17 would be the same as cell 23 but smaller in 
size because of the cost factor. Cell 21 would like cell 24 but be- 
cause of the assumed simple stimulus would require a smaller sample 
size while cell 20 would be a smaller version of cell 23 for the same 
reasons. Cell 14 would, because of cost, probably be like cell 20 
but even smaller while cell 15 would be like 21 but smaller. 
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This now completes the provisional analysis. We have generated 
almost 24 different types of evaluation techniques. No attempt is 
made to argue that this is where an evaluation theory will eventually 
lead. However, it does illustrate in more detailed terms what we 
mean when we say there must be a multi -model theory of evaluation. 
Hopefully, this initial formulation, crude as it may be, will encour- 
age others to pursue this inquiry more deeply. 



